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BIVARIATE  REGRESSION  WHEN  BOTH  VARIABLES  ARE  RANDOM 


In  dealing  with  certain  estimation  problems  in  biological  and 
chemical  research  it  is  frequently  necessary  to  compute  a  regression 
equation  which  can  be  used  to  predict  values  of  a  variable  Y  for  selected 
values  of  another  variable  X.  The  standard  procedure  calls  for  select¬ 
ing  a  fixed  set  of  values  of  X  and  then  sampling  Y.  If  this  procedure 
is  followed,  then  the  resulting  regression  equation  is 

Y'  =  at  +  bxX,  (1) 

where 

n  n 

bl  =  Z  (Xi  "  X)  (Yi  -  Y)  /  £  (X.  -  X)2  (2) 

i  =  1  i  1 

-  Y  -  bjX.  (3) 

This  family  of  equations  is  classically  used  in  computing  the  regression 
equation  for  Y  on  X. 

If,  on  the  other  hand,  it  is  desired  to  select  a  set  of  Y  values, 
sample  X  and  then  construct  the  regression  function  for  X  on  Y,  the 


resulting  equation  is 


(4) 


X>  '  a2  +  b2Yj 


where 


n  n 

b2  -  £  (X.  -  X)  (Y.  -  Y)  /  £  (Y.  -  Y)2  (5) 

i  -  1  i  1 

a2  -  X  -  b2Y.  (6) 

The  above  formulas  are  obtained  using  the  method  of  least  squares. 

Furthermore,  if  one  is  willing  to  assume  that  for  the  first  situation  Y 
is  normally  distributed  with  a  common  variance  about  the  regression 

f 

line,  and  for  the  second  situation  X  is  normally  distributed  with  a  com¬ 
mon  variance  about  the  regression  line,  then  the  estimates  above  are 
also  maximum  likelihood  estimates. 

Unfortunately,  in  practice  it  is  impossible  always  to  control  the 
independent  variable  X  or  Y,  as  the  case  may  be.  In  these  situations, 
then,  both  variables  will  be  subject  to  error,  or  random  variation.  (For 
example,  in  estimating  a  dose-response  function,  both  the  dose  and  the 
proportion  responding  to  that  dose  may  be  random  variables,  since  dose 
frequently  cannot  be  measured  precisely. )  When  such  a  sampling  situation 
arises  it  seems  advisable  to  consider  using  orthogonal  regression 
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techniques.  In  this  cas  -  the  sum  of  squares  of  the  perpendicular 
distances  to  the  regression  line  will  be  minimized.  If,  furthermore, 
the  variances  for  X  and  Y  can  be  standardized  in  some  sense,  then 
the  estimates  which  follow  are  also  maximum  likelihood  estimates.  The 
following  equations  assume  that  X  will  be  used  to  predict  Y.  An  inter¬ 
change  of  the  X  and  Y  values  will  make  it  possible  to  arrive  at  the 
equation  for  predicting  X  from  Y.  The  necessary  formulas  are: 

Y'  -  a3  +  b3X,  (7) 

where 


2 1  v, 


i  -  1 

a3  -  Y  -  b3X,  (9) 

and 

y.  <Y.  -  Y>,  x.  **  (X.  -  X).  (10) 

(Two  references  on  estimation  when  both  variables  are  subject  to  random 
error  appear  at  the  end  of  this  memo. ) 
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An  Example 


To  illustrate  the  kinds  of  results  which  can  be  obtained,  the  following 
example  has  been  selected  from  Reference  3.  The  data  represent  the 
heights  and  weights  of  12  men: 

X  (height  in 

inches):  60  60  60  62  62  62  62  64  64  70  70  70 

Y  (weight  in 

pounds):  110  135  120  120  140  130  135  150  145  170  185  160 

The  four  regression  equations  are  now  summarized: 

(1)  Regression  of  Y  on  X,  standard: 

Y'  -179.36  +  5.029  X 

(2)  Regression  of  X  on  Y,  standard: 

X'  40.  6l[+  .  164  Y 

(3)  Regression  of  Y  on  X,  orthogonal: 

Y'  -  -245.  57  ♦  6.066  X 

(4)  Regression  of  X  on  Y,  orthogonal: 

X’  40.48  ♦  . 165  Y. 
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For  this  particular  illustration  it  should  be  noted  that  the  results 
for  one  case  (X  on  Y)  are  quite  close  together,  while  for  the  other  case 
(Y  on  X)  the  differences  could  be  significant  in  any  inferential  treatment 
of  the  data. 

Su\^miary 

This  memo  presents  the  method  of  orthogonal  regression  and  com¬ 
pares  the  standard  linear  regression  procedures  with  orthogonal  linear 
regression  procedures.  Care  should  be  exercised  in  using  the  standard 
methods  when  both  X  and  Y  are  subject  to  random  variation. 

ThP  i'Miopnt»j'»-pgngra  m  Hpsrript  ion  and  the  lout  run  program  a  re 
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attached. 
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A.  IDENTIFICATION 


Title:  Orthogonal  Regression 

Identification: 

Category: 

Programmer:  Freida  E.  Robey 

Date:  October,  1963 

B.  PURPOSE  -  This  program  computes  the  mean  X(X),  mean  Y(Y), 
the  correlation  coefficient  (R),  the  orthogonal  regression  line 
for  Y  on  X  and  X  on  Y,  and  the  sum  of  the  minimum  residuals. 

C.  USAGE 

1.  Operational  Procedure 

This  program  is  in  FORTRAN 

(a)  Machine  load  Compiler  tape  III  (the  interpreter)  at 
P  =  0000.  Check  sum  -  0000. 

fb)  Clear,  position  the  binary  object  tape  in  the  reader 
and  run  (from  P  -*  0000). 


Error  Stop:  P  -  0052  Parity  error  stop.  Usually 
indicates  punch  trouble. 


(c)  The  FORTRAN  object  program  is  in  memory  and  ready 
to  be  executed.  Turn  on  the  punch,  position  input  tape 


f 


,  T 


(data)  in  the  reader  and  run  (fron  P  -  1020). 


2.  Data 

The  first  value  on  the  data  tape  is  N:  the  number  of 

pairs  of  data.  The  next  values  on  the  data  tape  are  and 

Y  pairs, 
n 

Format  Definition 

13  N 

2F20.8  X,  Y 


3.  Output 


The  output  is  punched  in  flexowriter  code,  and  includes 


X.  Y.  R  (the  correlation  coefficient),  the  equations  of  the 


regression  lines  Y  on  X  and  X  on  Y  and  the  sum  of  the  mini¬ 
mum  residuals.  The  equations  used  for  computation  are: 


where 


y  *  Yj  -  Y, 
x  =  X  -  X, 
a  Y  bX . 


Regression  line  Y  on  X  is: 


a.  -  X  -  aY. 
1 


Regression  line  X  on  Y  is: 


Minimurr  residuals: 


-i  2 

(Y.  -  a  -  bX.) 

1  1 


*2 
+  b 


(X.  -  a.  -  aY.)2 
i  i  i 
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C  ORTHOGONAL  REGRESSION 

10  FORMAT  (13) 

11  FORMAT  (2F20  8) 

12  FORMAT  <6HXBAR=;,F14.  8,  7H;YBAR=;,  F14.  8,  7H;;;;R=;,  F14.  8/) 

13  FORMAT  (20HREGRESSION;  EQUATIONS/) 

14  FORMAT  ( 1 7HREGRESSION ;  Y;ON;X/) 

15  FORMAT  (8HYPRIME=;,F14.  8,  4H;4,,,F14.  8,  1HX/) 

16  FORMAT  ( 1 7HHEGRESSION  ;X  ;ON ; Y  / ) 

17  FORMAT  (8HXPRIME  =  ;,  F14.  8, 4H;+;;,  F14.  8, 1HY/) 

18  FORMAT  (17MINIMUM; RESIDUALS/) 

19  FORMAT  (19HSUM;D(I);SQUARED;  =  ;,  F14.  8/) 

DIMENSION  X<100),  Y<100) 

1  XES=0 

YES=0 
SUMX=0 
SUMY =0 
SUMXY=0 
SUMIN1=0 
SUMIN2=0 
READ  10,  N 

READ  11,(X(I),  Y(I),I  =  1,N) 

C  COMPUTE  SUMS 
DO  20  1=1,  N 
XES=XES+X(I) 

20  YES=YES+Y(I) 

XBAR=XES/N 
YBAR=YES/N 
DO  25  1=1,  N 
Y(I)=Y(I)-YBAR 
X(I)  =X(I)  -XBAR 
SUMX =SUMX+X(I )  'X(I ) 

SUMY =SUM  Y+Y(I) '  Y(I) 

25  SUMXY =SUMXY4-X(I) '  Y(I) 

C  COMPUTE  REGRESSION  COEFFS 
DIFSSQ=SUMY-SUMX 

RADCAL=SQRTF(DIFSSQ'DIFSSQ+4.  ’SUMXY'SUMXY) 

DENOM=2.  'SUMXY 

BHAT =(DIFSSQ4RADCAL)  /  DENOM 

A= YBAR-BHAT  'XBAR 

DIFSSQ=SUMX-SUMY 

AHAT =(DIFSSQ+RADCAL)  /  DENOM 

A1  -XBAR-AHAT'YBAR 

C  COMPUTE  R 

R=SQRTF(BHAT  'AHAT) 


’  i  A 


C  COMPUTE  MINIMUM  RESIDUALS 
DENOM=l .  +BHAT’BHAT 
DO  40  1=1,  N 
Y(I)- Y(I)+YBAR 
X(I)  =X(I)+XBAR 
BNUM  -  Y(I)  -A-BHAT  'X(I) 

BNUM2  ~  BNUM 1 +BNUM2 
40  SUMIN  1  =SUMIN1+BNUM2 

SUMIN 1  "SUMIN 1  /  DENOM 
DENOM 1  - 1 .  +AHAT  •  AHAT 
DO  50  1=1, N 

ANUM-X(I)-A1_AHAT'Y(I) 
ANUM2=ANUM'ANUM 
50  SUMIN  2  =SUMIN  2+ANUM2 

SUMIN  2  =SUMIN  2  /  DENOM  1 
PUNCH  12,XBAR,  YBAR,  R 
PUNCH  13 
PUNCH  14 
PUNCH  15,  A,  BHAT 
PUNCH  16 

PUNCH  17,  Al,  AHAT 
PUNCH  18 
PUNCH  19,  SUMIN  1 
PUNCH  19,  SUMIN2 
PAUSE  0001 
GO  TO  1 
END 
END 


